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SOME  RESULTS  ON  SUBSET  SELECTION 


PROCEDURES  FOR  DOUBLE  EXPONENTIAL  POPULATIONS* 

by 

Shanti  S.  Gupta  and  Yoon-Kwai  Leong 
Purdue  Un i vers i ty 

I . introduction 

In  this  paper  we  study  the  selection  problems  and  some  other 
related  statistical  inference  problems  for  k double  exponential 
(Laplace)  populations.  Before  we  do  this,  v^e  give  some  discussion  of 
the  Laplace  distribution,  its  characteristics  (vs.  normal,  logistic  and 
Cauchy)  and  its  use  as  a model  in  statistics  and  probability. 

The  double  exponential  distribution  arises  as  a model  in  some 
statistical  problems  as  explained  later.  This  distribution  is  also 
considered  in  robustness  studies,  which  suggests  that  it  provides  a 
model  with  different  characteristics  than  some  of  the  other  commonly 
used  models  such  as  the  normal  distribution.  In  particular,  the  tails 
of  the  double  exponential  distribution  are  thicker  than  the  tails  of 
the  normal  or  logistic,  but  not  as  thick  as  the  Cauchy  (see  p.  1|3» 
llajek  [l^»]).  Yet  the  double  exponential  has  not  been  used  very  ex- 
tensively as  a model.  This  could  be  due  in  part  to  the  lack  of  available 
statistical  techniques  for  this  distribution,  although  it  is  likely  that 
the  experimenter  has  shied  away  from  using  the  double  exponential  be- 
cause it  has  a sharp  peak  in  the  center.  However,  many  applications 

would  be  primarily  concerned  with  tall  probabilities,  and  it  would  seem 

*This  research  was  supported  by  the  Office  of  Naval  Research  contract 
N000I4-75~C-01>55  at  Purdue  University.  Reproduction  in  whole  or  in  part  is 
permitted  for  any  purpose  of  the  United  States  Government. 
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that  the  double  exponential  would  be  a useful  model  If  exponential  tails 

arc  required. 

i 

The  double  exponential  has  some  application  as  a model  In  the  area 
of  Actuarial  Science,  and  it  has  been  suggested  as  a model  for  the 

r 

I distribution  of  the  strength  of  flaws  in  materials  by  Epstein  [8  ]. 

I Using  the  weakest  link  principle,  the  strength  of  the  material  should 

decrease  as  the  number  of  flaws  or  volume  increases.  In  particular, 

I 

from  extreme-value  theory  the  double  exponential  assumption  leads  to 

the  result  that  the  mode  or  most  probable  strength  decreases  In  pro- 

, portion  to  log  n,  where  n represents  the  size  or  number  of  flaws  of  the 

‘ material.  In  comparison,  the  assumption  of  a normal  model  leads  to  a 

f 1 /2 

' decrease  in  proportion  to  (log  n)  . For  most  applications  to  material 

^ strength,  only  the  minimum  flaw  strength  would  ordinarily  be  observable; 

however,  Epstein  [8]  suggests  that  there  may  be  many  other  types  of 

1 problems,  such  as  a system  of  components  in  series,  which  might  be 

f 

similar  from  a statistical  point  of  view.  Other  possible  applications 
of  the  double  exponential  are  suggested  by  the  fact  that  the  difference 
of  two  independent  (not  necessary  identical)  two  parameter  exponential 
variables  follows  the  double  exponential  distribution,  and  that  the 
logarithm  of  the  ratios  of  uniform  or  Pareto  variables  follows  the 
double  exponential  distribution. 

In  classical  theory,  once  having  assumed  the  form  of  the  parent 
distribution,  we  can  derive  a criterion  which  is  appropriate  to  this 
assumption.  For  example,  under  the  assumption  of  normality,  for  the 
comparison  of  two  means  we  would  derive  the  t-statistic.  It  Is  then 

j 

customary  to  justify  the  use  of  such  a normal  theory  criterion  in  the 
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practical  circumstance  In  which  normality  cannot  be  guaranteed  by 
arguing  that  the  distribution  of  the  characteristic  Is  but  little 
affected  by  non-normality  of  the  parent  distribution  - that  Is,  It  is 
robust  under  non-normality.  However,  this  argument  ignores  the  fact 
that  if  the  parent  distribution  really  differed  from  the  normal,  the 
appropriate  criterion  would  no  longer  be  the  normal-theory  statistic. 

Box  and  Tiao  [ i»  ] reconsidered  the  analysis  of  Darwin's  paired  data 
on  the  heights  of  seif  and  cross-fertilized  plants  quoted  by  Fisher 
In  "The  Design  of  Experiments  (1935)".  In  this  development  the  parent 
distribution  is  not  assumed  to  be  normal,  but  only  a member  of  the 
following  class  of  symmetric  distributions 

p(y|e,a,6)  = j — exp  {-  ■=■  [^^1  } (1.1) 

, 1+5(1+B)  ^ ^ 

r[l+i(l+6)]2  o 

where  - oo<y<®,0<a<»,  - ®<0<<»,  -1<B^1.  This  class  of 
distributions  includes  the  normal  (3=0)  and  the  double  exponential  (3=1), 
and  Its  kurtosis  parameter  is  3. 

If  the  probability  density  function  of  the  double  exponential  is 
given  by 

, 1 -1^1 

f(x,D,o)  “ e ,“«o<x<“>,  ■“  < 9 < «o,  a > 0 

(1.2) 

then  the  mode  of  the  distribution  is  x > 6 where  it  has  a sharp  peak. 

The  expected  value  and  standard  deviation  of  (1.2)  are  9 and  JT 
respectively.  Moments  of  the  standardized  double  exponential  order 
statistics  can  be  obtained  by  using  the  closed-form  expressions  for  the 
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mofnents  of  the  standardized  negative  exponential  order  statistics 
derived  by  Epstein  and  Sobci  [9].  Govindarajulu  [lo]  has  given 
the  expressions  for  these  moments. 

Chew  [6]  gives  the  graphs  of  the  standardized  density  functions 
of  normal,  logistic  and  double  exponential  distributions,  from  which 
It  is  clear  that  the  tails  of  the  double  exponential  distribution  are 
thicker  than  that  of  the  normal  or  logistic,  in  the  sense  that  the 
curve  of  double  exponential  is  above  that  of  the  others  to  the  left 
and  right  of  some  points.  In  the  case  of  the  normal  distribution  this 
point  is  2.6^4. 

’ / 


If  the  cumulative  distribution  functions  G,  (x)  = 


1 2 
■ T “ 

e du 


1 /2> 


/Zv 


and  G^Cx)  = { 


2 ® 


, X < 0 


, 1 -i/2x 

1 - y e , X > 0 


of  the  standardized  normal  and  double 


exponential  distributions  are  compared,  (also  similar  comparison  between 

IT 


Standardized  logistic  Gj(x)  = 1/(1  + e 


/3 


) and  the.  double  exponential 


distribution)  the  differences  G^(x)  - Gj  (x)  (as  well  as  G2(x)  - G^(x)) 


vary  in  the  way  shown  in  the  graph  below.  Since  Gj(x),  G^(x)  and 


Gj(x)  are  symmetric  about  x = 0 only  the  values  for  x ^ 0 are  shown. 

With  regard  to  point  estimation,  it  is  v^el 1 known  that  the  maximum 


likelihood  estimates  based  on  the  complete  sample  of  size  n are  given 

n 


by  0 ■ X and  o “ — E |X,  - X|,  where  X denotes  the  sample  median. 

i = l ' 

Also  best  linear  estimators  (based  on  order  statistics)  under  symmetric 
censoring  are  given  by  Govindarajulu  [l  I ] for  sample  sizes  up  to  20, 
and  some  alternate  estimates  are  suggested  by  Raghunandanan  and 
Srinivasan  [16].  Interval  estimation  for  the  parameters  of  the 
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two-parameter  double  exponential  distribution  Is  considered  by  Bain 
and  Engelhardt  I )]. 

Now  we  discuss  the  problem  of  comparison  of  k(>  2)  double 
exponential  distributions.  First  we  study  the  selection  problem  for 
the  largest  mean  (location). 

2 Selecting  a Subset  Containing  the  Best  of  Several  Double 
Exponential  Populations  with  Respect  to  the  Location  Parameter 

(A)  Formulation  of  the  Problem 

Let  Xj,  i = l,2,...,k  be  k independent  random  variables  from 
double  exponential  population  tt.,  i = l,2,...,k  respectively,  with 
probability  density  function 

f (X;  0|  ,o)  = ~ exp  [-lx-9jl/a],  - oo<x<®,  - <»<0,  <«,  a>0 

where  o is  a common,  known  constant  for  each  of  the  k populations.  We 
may,  without  loss  of  generality,  assume  a to  be  one.  The  ranked  para- 
meters are  denoted  by  £ 0j2j  £ £ ®[k]'  before,  it  is 

assumed  that  there  1 s no  a priori  information  available  about  the 
correct  pairing  of  the  ordered  0jjj  and  the  k given  populations  from 
which  observations  are  taken.  Any  population  whose  parameter  value 
equals  0^|^j  will  be  defined  as  a best  population.  A correct  selection 
(CS)  is  defined  as  the  selection  of  any  subset  of  the  k given  popula- 
tions which  contains  at  least  one  best  population. 

Suppose  we  take  (2n+l ) independent  observations  from  tt., 

I ■■  l,2,...,k;  the  sample  size  (2n+l ) is  assumed  to  be  given  in  the 
primary  problem  below.  Let  ^ P*  < I ) be  a preassigned  constant. 
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Let  P(CS;  k,  n,  R)  denote  the  probability  of  a correct  selection 
when  the  procedure  R is  used  with  the  given  k,  n and  when  the  true 
configuration  of  parameter  values  is  (Oj,  let  the  space 

of  all  possible  values  of  ^ be  denoted  by  *1. 

The  problem  of  primary  Interest  is  to  define  a procedure  R which 
selects  a subset  of  the  k given  populations  that  is  small,  never 
empty,  and  large  enougli  so  that  it  contains  the  best  population  with 
probability  at  best  P*,  regardless  of  the  true  configurations  £ in 
n,  i.e.,  so  that 


inf  P(CS;  k,  n,  0,  R)  > P*  . (2.1) 

n 


After  having  defined  a particular  procedure  R = R(k,  n,  P*)  for  each 
possible  set  of  values  of  k,  md  P*,  we  discuss  the  expected  size 
E{S;  k,  n,  P*,  R)  of  the  selected  subset  when  the  procedure  R is 

used  with  the  given  k,  n,  P*  and  where  £ is  the  true  parameter  con- 
figuration in  (1. 

Let  Yj  denote  the  sample  median  of  the  (2n+l)  observations 

Xjj Xj  2n+I  * f*'®”  The  i th  population,  and  let  Y^.j  denote  that 

unknown  variable  which  is  associated  with  probability  density 

g (•)  and  the  cumulative  distribution  G (•)  of  Y,  are  given  by 
n n I 


gjy;  e.)  - ) d-y*  ) (2.2) 


G^(y;  0|)  - t 


i'  n!n!  '2 
n 

«o  •* 


o , Y”0,  j 1 y®i  2n+l-j 

1 /2o^l\/l  /I  ^ 1 

1-  E (,  ){ye  )(l-^e  > 


j> 


. y<6) 


^ ' I ' 'T  * ^ 2®  ' • ®i 

j«0  ” 

(2.3) 
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Now,  we  propose  the  selection  procedure  Rj  as  follows: 

Rj:  Retain  In  the  selected  subset  only  those  populations 

for  which 


Y,  > n«x  Y,  - d 


(2.M 


where  d = d(k,  n,  P*)  is  the  smallest  non-negative  constant  to  be 
determined  that  will  satisfy  the  basic  probability  requirement  (2.1) 
for  all  configurations  ^ = (0j,  02,...,6|^). 

(B)  Probability  of  a Correct  Selection  and  Its  Infimum 

The  following  result  concerning  the  rule  Rj  can  be  proved. 

CO 

Theorem  2.1.  inf  P„(CS|R,)  = inf  P„(CS|R,)  = f * (y+d)q  (y)dy 

0eQ  ^ ' 0efi  ^ 

— — o 

where  = {£  = (Oj , . . . ,0j^)  : 0j  = 0^  = ...  = 0j^  = 0),  G^(y) , 9n(y)  are 
the  cdf  and  pdf  of  the  sample  median  of  (2n+l)  observations  from  the 
standard  double  exponential  distribution. 

Proof.  For  0 e fi, 


■ V^k)  i ''(j)  ■ -o 


V’'(k)‘®lkl  * ®[jl'®[k]"‘'-  

/ n / ^ ^ g^(z)  dz  g^(y)  dy . (2.5) 

•00  j ^ 1 


Note  that  0jj^j  - 0jj^  ^0  for  j = I,..., k-1;  thus  the  result  follows. 
Hence,  If  we  choose  d to  be  the  smallest  constant  to  satisfy 


/ (y+d)  g^(y)  dy  = P , 

•00 


r 
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th«»n  we  have  determined  the  constant  d for  v/hich 


Inf  Pft(CSln,  ) •=  P*  . 

ecn  i ' 


(c)  Some  Properties  of  R ^ 

For  £ e n and  ^ [ j ] » • • • [i^] ) define  P0(i)  = P0  tP  select 

population  3^d  recall  the  following  definitions  (see  Santner 

Definition  2.1.  The  rule  R is  strongly  monotone  in  means 


Pfl(i)  is  { 


+ in  9j.j  when  all  other  components  of  ^ are  fixed 
+ in  9jjj  (j>^i)  when  all  other  components  of  ^ are  fixed 


Definition  2.2.  R is  a monotone  procedure  means  for  every  ^ e 
and  I 1 i < j 1 k#  ^9(0  1 

Definition  2.3.  R is  an  unbiased  procedure  means  for  every  £ e fi 
and  I < j < k, 


Pq{R  does  not  select  2.  ^0^*^  does  not  select 

Of  course,  if  R is  monotone  It  is  also  unbiased. 

Theorem  2.2.  For  any  i = l,2,...,k,  the  procedure  Rj  is  strongly 
monotone  in  • 

Proof.  The  proof  follows  easily  from  the  expression 

00  k 

P0(i)  “ / { n G^(y  + 0J.J  - + d)}  g^(y)  dy  . 

j»‘i 

Corollary  2.1.  The  rule  Rj  is  monotone  and  unbiased. 

Proof.  It  is  known  and  easy  to  see  that  if  R is  strongly  monotone  in 


for  all  I = 1,2, then  it  is  monotone. 

Now  we  consider  some  special  configurations  of  6 e 

, 1 “ 1,2 k-1 


{ 


® 


0 


[k] 


0 + A , A > 0 
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(2.8) 


di 


0 + (i-l)A  , A > 0,  i = l,2,...,k. 


Under  (2.8), 


(2.9) 


Po(i)  = / [G  (y+d)]*^  ^ G (y+d-A)  g„(y)  dy  for  1 = 1,2 k-1 


P0(k)  = / [G^(y+d+A)]'^  ' 


(2.10) 

(2.11) 


Whi  le  under  (2.9), 


Pq(1)  = / { n G^(y+d+(i-j)A)}  g^(y)  dy.  i=l,2,...,k. 

— -00  j = l 

From  the  above  equations  we  can  make  the  following  remarks: 

Remark  2.1.  For  fixed  P*,  k,  n,  i (i  = 1 ,2, . . . ,k-l ) , the  probability 
of  selecting  population  Hjjj  decreases  from  P*  to  zero  as  A increases 
from  zero  to  infinity. 

Remark  2.2.  For  fixed  P*,  k and  n,  the  probability  of  selecting 
increases  from  P*  to  one  as  A increases  from  zero  to  infinity. 

Remark  2.3.  For  fixed  P*,  k,  i (i  = l , . . . ,k-l ) am."  A,  the  probability 
of  selecting  population  tends  to  zero  as  n -♦•  “>.  While  the 

probability  of  selecting  tends  to  one  as  n -*■  «. 


Conclusion:  Under  either  configuration  (2.8),  (2.9), 


Eq(S|R|)  = Z P0(l)  ' as  A -►  » for  fixed  n and  Eq(S|R|)  ■*  I as 

n ■+■  oo  for  fixed  A. 


(O)  Asymptotic  Results  for  the  Procedure  R| 

It  suffices  to  consider  the  parameter  space  For  n large, 

we  discuss  an  asymptotic  property  of  the  procedure  as  follows.  Let  Y 
be  the  sample  median  from  a sample  of  size  (2n+l ) with  pdf 

f(x;0)  “ ^ ^ < X < «.  Then  it  is  known  (see  Chu  [7  ])  that 

Y*9  2 1 

under  Q , — is  asymptotically  normally  distributed  (here  ••  . 

n 

Let  Z denote  a random  variable  which  has  a standard  normal  distribution, 
Y“9 

then  - — is  asymptotically  distributed  as  Z.  Hence,  under  the 
n 

probab i 1 i ty 

Y,  > max  Y.  - d 

Is  asymptotically,  the  same  as  the  probability 


Z,  > max  Z.  - /2n+l  d 


where  Z.,  i - l,2,...,k,  are  lid  standard  normal  variables.  Hence, 


Inf  Pq(CS|R,)  5 Po{Z,  > max  Z.  - d} 


/ < 
•>00  L* 


4>(z  + /2n+1  d)  d<I>(z) 


where  4>(.)  is  the  cdf  of  the  standard  normal  distribution. 


(E)  The  Monotone  Li kel i hood  Ratio  Property  of  the  Sample  Median 

Suppose  Y Is  the  sample  median  of  (2n+l ) observations  from  the 

I _ I I 

population  with  double  exponential  density  function  f(x;0)  = ye  ' 


hv,>  um 
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The  pdf  g^(y;0)  and  cdf  G^(y;0)  of  Y are  given  by  equations  (2.2) 
and  (2.  3) . 

After  some  algebraic  computations,  we  see  that  G^(G;9)  = ^ ; also 

it  is  easy  to  show  that  g^(y;9)  is  differentiable  at  y *•  0. 

Let  g (y;0)  = g"  (y-0).  It  is  shown  in  Lehmann  (15,  p.330]  that  a 
n n 

necessary  and  sufficient  condition  for  g^(y~0)  to  have  monotone  likeli- 
hood ratio  in  y is  that  -log  is  convex.  Our  main  goal  in  this 
section  is  to  prove  this  assertion.  Now 


— / s /I  -IvLn+l,.  I -|yl\n  ^ _ _ (2n+l)i 

gn(y)  = Cn(je  lyi)  (l--e'')  where  . so, 

- log  gj,(y)  “ “ lo9  + (n+i)  log  2 + (n+l)|y]-n  log  (1-^  e 

Let  h(y)  = (n+l)(y|  - n log  (1  - je  ^ h^  (y)  , y < 0 which  is 

^’2  . y 1 0 

a continuous  function.  For  y < 0, 

h(y)  = hj(y)  = -(n+l)y  - n log  (i  - e^),  we  have 


l.y 


S-e^  -e' 

hr(y)  = - (n+l)  + -2 — j — - < 0 since  for  y < 0,  -2 — j — - < 1 
' 1 - ^ e^  \ 


and 


^ e^ 


(I 


> 0 . 


Hence,  for  y<0,  h^  (y)  is  a decreasing,  convex  function.  Similarly, 
for  y ^ 0, 

h(y)  *=  h-(h)  = (n+i)  y - n log  (1  - y e ^) 


h2(y)  = n+l 


T' 


^ — — > 0 since  for  y ^ 0, ^ < 1 

1 — " a 


1 - 1 e"y 

I 2 ® 


L.-y 

r 

2 


> 0 • 
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Hence,  for  y ^ 0,  h2(y)  is  an  increasing,  convex  function.  Note  that 
h(y)  Is  continuous  at  y = 0,  decreasing,  convex  for  y < 0 and  increas- 
ing, convex  for  y ^ 0.  Hence,  this  concludes  that  h(y)  Is  a convex 

function,  which  implies  - log  9^(y)  is  also  a convex  function. 

Theorem  2.3.  9 (y;0)  has  monotone  likelihood  ratio  in  y. 

■■■  — n 

(F)  Expected  Size  of  the  Selected  Subset 

The  procedure  Rj  satisfies  the  basic  probability  requirement 
(2.1)  and  Is  defined  by  {l.k).  Consistent  vylth  the  basic  probability 
requirement,  we  would  like  the  size  of  the  selected  subset  to  be  small. 
Now  S,  the  size  of  the  selected  subset  is  a random  variable  which  takes 
integer  values  l,2,...,k.  Hence,  one  criterion  of  the  efficiency  of 
the  procedure  Rj  is  the  expected  value  of  the  size  of  the  subset.  Now, 
we  derive  an  expression  for  E(S|Rj),  the  expected  size  of  the  selected 
subset  using  procedure  R| . 
k 

E(S|R  ) = I,  P{Selecting  the  population  with  parameter  Qr.i) 

' 1-1 

k 

= Z P{Y/,.  > max  Y...  - d} 
i-1  Kj_<k 

' ,f,  L [3,  ■ ®[J)M  '''• 

jr*! 

If  we  set  the  m smallest  parameters  0j  (1  ^ m < k)  equal  to  a common 
value  0(say)  and  define 

Q - E(S  I 0j,j  - ...  - 0j^j  - 0)  ( 2.17) 


lA 


then  by  an  analogous  argument  as  In  Gupta  [13]  one  can  prove  the 
following  theorem. 

Theorem  2.k.  For  given  k,  < P*  < I),  the  expected  size  of  the 

selected  subset  E(S  | 0^^^  = 0^^]  = ...  “ = 0,  m < k)  In  using  the 

procedure  Rj  is  strictly  increasing  in  0. 

00 

Corollary  2.2.  sup  Ef,(S|R  ) «=  k / G*'"’ (y+d)  g (y)  dy  = k P*. 

' 00  " " 

Corollary  2.3.  In  the  subset  n(5)  «=  0^jj  ^ 0jl^j  - 6, 

I = I ,2, . . . ,k- I } , the  function  EqCsIRj)  takes  on  its  maximum  value  when 

9[jj  = - 6,  i = 1,2 k-I,  and  so 

00 

sup  E (SlR  ) = J G|;‘’(y+d+6)  g (y)  dy 
0en(6)  ^ ' -00  " n 

00 

+ (k-I)  / g|^  ^(y+d)  G^(y+d-6)  g^(y)  dy  . 


(G)  Minimax  Property  of  the  Rule  Rj 

Suppose  that  yj,...,y|^  are  the  sample  medians  from  the  k populations 
^I*....^k*  '■espectively,  and  with  this  set  of  observations,  we  select 
the  ith  population  with  probability  4' j (yj , . . . .yj^) . Then  the  selection 
rule  R is  said  to  be  invariant  or  symmetric  if 


*^i^^I*****^i****’^j*****^k^  ™ 4*j(yj,...,yj,...,yj,...,y|^) 

for  all  i and  j,  i.e.  Yj  observed  from  it.  and  y.  from  tTj  , then 

we  select  the  jth  population  with  the  same  probability  4>j  (y^ , . . . ,y|^) . 

Notice  that  the  rule  R : Y.  > max  Y,  - d satisfies  the  equations 

' ' l<j<k  J 


Inf  P.(CS|r  ) - inf  P.(CS|R,)  = P.  (CS|R.)  - P* 
6en  - 'sen  - ' ^ ' 

— — O 


(2.20) 
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and 


sup  E. (S|R  ) “ sup  Ep(S(R  ) » Ea  (S|R  ) = k P* 

0efi  - ' 6eO  - ' ^ ' 

— — O 


{ 2.21) 


where  9^  = {0^,...,0^). 

For  any  Invariant  rule  R",  0^  e ft  , 


Eg  (S|r^)  = E Pg  {select  population  TfjlR'’} 


I I e 
i=l  -o 

k 

Z j <).. 

1-1 


(y,,....Yk)  j^.n^  9n(yj)J 


dy,  ...  dy. 


k Pg  (CS|R"), 
-o 


Hence  for  0 eft, 
-o  o 


Eg  (S|R')  - Eg  (S|Rj)  = k [Pg  (CS|R')  - Pg  (CS|R,  )]  (2.22) 

“O  ”0  ”0  "O 

If  the  rule  R'  satisfies  the  basic  condition,  it  follows  from  (2.20) 
that  the  right  hand  side  of  (2.22)  is  non-negative.  Thus 

Eg  (S|R")  > Eg  (SjR  ) = sup  Eg(S|R, ) . 

-o  -o  £eft  - 

So  that  sup  Eq(S|R')>  sup  E„(S|R,  ) 

0eft  0eft  ^ 

l.e.  the  rule  R^  is  minimax  among  all  invariant  rules  satisfying  the 
P*-condi tlon. 


3.  Selecting  the  Population  with  the  Largest  Location 
Parameter  - Indifference  Zone  Approach 
In  this  section,  we  would  like  to  use  the  indifference  zone 
approach  of  Bechhofer  [ 3]  to  select  one  population  which  is  guaranteed 
to  be  associated  with  the  largest  location  parameter  with  a fixed 
probability  P*  whenever  the  unknown  parameters  lie  outside  some  subset. 
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or  zone  of  Indifference,  of  the  entire  parameter  space.  The  goal  Is 
to  define  a sequence  of  rules  {1^2(0)}  each  of  which  selects  a single 
population  and  find  the  smallest  n so  that 

Pg(CS|R2(n))  > P*,  V 0cn(6*)  - {9:  ^ ® [k-| ] 1 

where  P*  and  6*  are  preassigned  numbers. 

For  the  sake  of  clarity,  we  will  use  the  notation  to  denote 

the  largest  of  the  sample  medians  each  based  on  (2n+l)  observations. 
1^2(0);  Select  the  population  corresponding  to  Yjk]n’ 

'it  it 

Let  J2^(6  ) = {9:  9j^j  ■ ...  “ “ ®[k)  ” 

following  theorem. 

Theorem  3.1.  inf  P„  (CS | It,  (n) ) « inf  P„  (CS|  It,  (n) ) 

0en(6*)  - ^ 0efi^(6*)  - 

Proof.  For  0 e 0(6*) , 


Pe<cs|«2("))  - »5<  V(j)„  < 


^8*''(J)n  ' ’'(k)n’  J 

’’e*’'(j)n'®(j]  ' ''(k)n'®[k]*®Ikr®ljl’  ^ 


/"rV  C„(v  + S )]dG„(y) 
-00  |_  J“l  _ 


(3.2) 


where  Gp(y)  * sample  median  of  (2n+l) 

independent  observation  from  the  standard  double  exponential  distrlbu- 

I -Ixl 

tion  with  density  function  ■j-e  ' ',-oo<x<«o,  and  6j^j  ■ ®[k]“®[j] 
Hence  the  inf  I mum  of  the  probability  of  a correct  selection  occurs  when 

®[I]  " ®(2]  “ •••  " ®[k-l]  " ®[kl  " ^ provided  0j^j  - 0j^_,j  > 6 . 

This  proves  the  theorem. 


J7 


The  minimum  sample  size  required  to  achieve  the  P*  condition 
(3>l)  is  the  smallest  integer  n such  that 


/ tG^(y  + 6*)]'^'’  d G^(y)  > P' 


(3.3) 


Selecting  the  t-Best  Populations  - Indifference  Zone  Approach 
Now,  we  consider  the  problem  of  selecting  the  best  t populations, 
I.e.,  the  populations  with  location  parameters  ® ' ’ * * [k]  * 

without  regard  to  order.  We  are  using  the  indifference  zone  approach 
based  on  the  sample  median  Yj  of  2n+l  Independent  observations  from 
population  , I a l,...,k.  Define  a sequence  of  procedures  as  follows: 
Rj(n):  Select  the  t populations  associated  with  t largest  values  of  Yj. 

Let  « {£:  1 <5)  and  let 


n;(6  ) « {9:  = ®[k-t+l]““*”®[k)  “ 

Theorem  A.l.  Inf  Pn{CSlR„(n)}  = Inf  . P„{CS|R, (n)} 

0en"(6*)  i 3 0en'(6")  i 3 

Proof.  It  was  shown  In  Theorem  2.3  that  the  pdf  9j^(y»  0)  of  the 

sample  median  has  monotone  likelihood  ratio  In  y,  which  Implies  that 

it  is  stochastically  increasing  In  6.  Using  a theorem  of  Barr  and 

Rlzvl  (2  ],  It  follows  that,  for  0 e n'’(6*) 


P_{CS|R-(n)}  “ Pq{  max  Y...  < min 
i 3 k-t+l<j<k 

is  a non-IncreasIng  function  of  ®[]]  ®nd  a non-decreasing 

function  of  ^[k]*  P0{CS|Rj(n)}  attains  its  Infimun 

when  ®[j]  attain  their  maximum  possible  values,  while 


'W 
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0 j , , . . ,9  attain  their  minimum  possible  values  subject  to 

The  proof  is  thus  completed. 

Using  the  same  notation  as  in  Section  2,  let  denote  the 

cdf  of  the  sample  median  Yj  with  parametei  6|.  Since  0.  Is  the  location 
parameter,  9^(y;  9.)  (y  - 9j;  0)  and  is  stochastically  Increasing, 

continuous  in  both  y and  9|  . For  £ e f2'(6*), 

P„{CS|R,(n)}  = P.{  max  Y.,.  < min  Y/o\) 


Pf>{  U 19\ 

- j=k-t+l  k-t+l<«.<k  ' ^ 

l<£<k-t  ' ^ 


k 00  k-t  k 

I ! n G (y;9.  ,)  n 
j=k-t+l  -00  B=1  a=k-t+l 

a»*j 


In  particular,  for  £ e n'(6  ) cn"(6  ), 

,k-t , 


t-I 


Pq{CS|R  (n)}  = t / gJ^  ‘(y;0)  {I  ■ G^(y;0+6  )}  dG^(y,9+6  ) 

MB  *00 

00  I 

» t / c'^"‘(y-0;O)  {l-G„(y-e-6*;0)}  dG  (y-6-6*;0) 


» t / c|j"*^(y+5*;0)  {1  - G^(y;0)}'^"'  dG^(y;0) 

*00 

which  Is  independent  of  the  parameter  6.  Hence  for  specified  values  of 

6*  and  P ( -4—  < P*  < 1),  we  can  solve  the  equation 

0 


,t-l 


t / cl^'^y  + 6*1  0)  {1  - G„(y;  0)}*"'  dG„(y;  O)  - P* 


for  n. 


L 
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5*  Subset  Selection  with  Respect  to  the  Scale  Parameter  o 
Let  I > be  k independent  random  variables  from  double 

exponential  popuiation  tTj  , i * l,2,...,k,  respect i vei y,  with  tTj  having 
the  probabiiity  density  function 


f (x;0j  ,Oj)  = exp  [-  |x-0j  |/a,  ],-<»<  x <«»,-<»<  0,  < «,  a,  >0. 


2o, 


'i  •'''i 


Take  n independent  observations  from  it.,  i = l,2,...,k.  From  these 
data  one  wishes  to  select  a subset  contains  the  population  with  the 
largest  Oj . Let  ^ ^ be  the  ordered  parameters.  We 

consider  two  different  cases. 

Case  (I); 

in  this  case,  the  maximum  likelihood  estimator  of  Oj  Is 

Y.  = — E |x. . - 0. I which  is  distributed  as  a gamma  variable  with 

j=l  ^ 

o.  n-1  a. 

parameters  n and  — , i.e.  Y,  has  density  \ (^)  e , y > 0. 

n I Oji^nyOj 


Thus  the  problem  reduces  to  the  one  considered  by  Gupta  [|2].  The 
selection  procedure  is 

R:  Select  the  population  iTj  in  the  subset  if  and  only  If 

Y.  ^ c max  Y, . 

I<j<k  J 

Case  (ii);  0j's  are  unknown. 

When  0j  Is  unknown,  It  Is  well  known  that  the  maximum  likelihood 
estimate  of  a.  is  given  by  a.  = - E |X,.  - X.|,  where  X,  denotes  the 

I inj^|ij  I I 

sample  median  from  population  tTj,  For  this  problem,  we  propose  the 
following  selection  procedure. 

R^:  Select  the  population  tTj  in  the  subset  If  and  only  if 


O.  > c.  max  o. 
“ l<j<k  ■* 
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where  0 < c < 1 i s so  determined  as  to  satisfy  the  basic  probability 


requirement  regardless  of  what  the  unknown  0,'s  may  be. 

nS  ' 

Let  V,  - , I ■ Then 

i 


P(CS|R^)  = a^.)} 


00  r k-i 
f n 
0 U'l 


Fy  ( ^ X)  dF  (x) 

^j)  "A  °[j]  J ^k) 


inf  P(CSjR.)  = Inf  P(CS|R.)  = / (■—)  dF  (x), 

ocil'  az9,'  0 

— — o 


where  = {a  = (a, a.),  a.  > 0,  i = 1,...,k}, 


n'  - {a  = (o,...,o),  0 > 0}  and  F (•),  F (•),  j = 1 k are  the 

o - ^ V ^{j) 

cdf’s  of  V «»  “ , Vqj  = — j , j = l,...,k,  respectively. 


Hence  if  the  distribution  Fy(*)  is  known,  then  the  constant  c^ 


can  be  determined  by  the  equation 


; fJ'NJ-)  dF  (x)  = P". 

o '* 


The  exact  distribution  F of  V is  worked  out  for  n = 3 by  Bain  and 


Engelhardt  [•  ]*  and  a chi-square  approximation  Is  also  given  by 


them  which  is  quite  good  even  for  small  n.  However,  it  follows  from 


Chernoff,  Gastwlrth  and  Johns  [ 5],  that  — (V-n)  = v'rT  [ ^ - 1] 

/n 


is  asymptotically  a standard  normal  variable.  When  all  Oj  are  identical 


I 
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P(CS|R^)  *=  P{0|^  > 0.,  j = 1 k-1} 

= P{i^(-^  - I ) ^ " 1 ) + /tTCcj^-I  ) , j 

k-1  x-i/fr(c^-l) 

ji.  t H , 


1 k-1} 


= / 


-)  d4>(x). 


6*  A Test  of  Homogeneity  Based  on  the  Sample  Median  Range 

Let  Tij  ,tt2»  • * • ^ independent  double  exponential  populations 

such  that  the  observations  X., X.  » from  ti.  has  density 

il'  ’ i,2n+l  i 

, -lX-0.1 

^e  , for  i = l,2,...,k.  As  before,  let  the  sample  median  of 

these  (2n+1 ) observations  be  denoted  as  Y.,  1 = 1 k.  In  some 

practical  situations  one  wishes  to  know  whether  6.  are  significantly 

different  or  not.  This  problem  is  to  test  the  homogeneity  of  the  double 

exponential  populations.  We  are  interested  in  using  a test  based  on  the 

sample  range  of  Y's  and  hence  we  wish  to  derive  the  distribution  of  the 

sample  median  range  R = max  Y.  - min  Y,,  considering  all  0.  to  be 

Uj£k  ■' 

equal  to  a common  unknown  0.  When  the  value  of  R is  large,  the 
hypothesis  of  homogeneity  is  rejected.  Vie  wish  to  find  a constant  r, 
such  that  P(R  > r)  ^ a under  the  hypothesis  9j  = ...  * = 0.  This 

will  provide  an  a-level  test. 

Theorem  6.1.  For  o,  0 < a < 1,  let  r be  a constant  such  that 

- lj^'<k-l  J - 

Then  Pjj  (R  > r)  jc  a. 
o 

Proof.  V/hen  H is  true,  i.e.,  under  R , 

- ■ ' o ’ ’ o 
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P(R  > r)  •=  P(  max  Y.  - min  Y.  > r} 

■*  ■* 

k 

^ k - I P{Y,  > max  Y.  - r} 
1 = 1 "* 

”k-kP{Y,  > max  Y.  - r} 
“ lU^k-i  J 

1 k - k.(l  - ^) 


The  above  theorem  establishes  a connection  between  the  selection 
rule  R|  and  the  above  test  for  equality  of  0's. 


7.  On  the  Distribution  of  the  Statistic  Associated  with  R^ 
Let  Y.  (i  = 0,1,. ..,p)  be  (p+1)  independent  and  identically 


distributed  random  variables  each  representing  the  median  in  a random 

sample  of  size  (2n+1)  from  a population  with  standard  double  exponential 

density  function  f(x)  = tc  ’ Consider  the  differences  Z.  = Y«  " Y 

z I I o 

(1  » 1,2,...,p).  The  random  variables  Z.  (i  = 1,2,...,p)  are  correlated 

and  the  distribution  of  the  maximum  of  Z.  is  of  interest  in  problems  of 

selection  and  ranking  for  double  exponential  distribution  as  explained 

earlier  when  discussing  Rj.  In  this  section,  we  give  a closed  form  of 

the  distribution  of  Z = max  Z,  for  some  special  cases.  We  have  also 

Ki£p  ' 

computed  tables  of  the  upper  percentage  points  of  Z » max  Z 


KKp 


corresponding  to  the  probability  levels  a = P*  ■ 0.75,  0.90,  0.95.  0.99 
for  p = 1(1)  9,  n = 1(1)10. 

For  the  special  case  P = 1 (k=2),  n = 1 (sample  size  ■ 3),  straight 
forward  integration  gives  the  cdf  of  Z(see  formulae  (2.2),  (2.3)) 


1 

i 


I 

i 


as 
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00 

P(Z^  z)  - / G{x  + z)  y(x)  dx 
-00 


Again,  for  p = 1 (k=2),  n = 2 (sample  size  = 5). 


P(Z  < z) 


225  -^z 


5225 

”3^ 


-'4Z 

e 


+ 


10975  ,-3z 
1792 


All  computations  related  to  and  given  at  the  end  of  this  chapter 
were  made  on  a CDC  6500  using  Gauss  Laguerre  quadratue  based  on 
fifteen  nodes  to  perform  the  numerical  integration.  Checks  on  the 
accuracy  of  the  program  for  p = I,  n = 1 showed  that  these  values  seem 
to  be  correct  to  three  decimal  places. 

8.  An  Example  and  Application  of  the  Selection  Rule  and  the  Test  of  Homogeneity 
We  would  like  to  illustrate  the  use  of  the  selection  procedure  and  the 
test  of  homogeneity  of  location  parameters  for  double  exponential  distributions. 
It  is  known  that  the  difference  of  two  independent  two-parameter  exponential 
variables  with  the  same  scale  parameter  follows  a double  exponential  distri- 
bution. Using  a statistical  package  G6-RVP  designed  by  H.  Rubin  and  C.  Hinkle 
at  Purdue  University,  we  generated  a set  of  exponential  random  numbers,  from 
which  we  obtained  5 sets  of  double  exponential  random  numbers  with  location 
parameters  tt.  to  be  0,  2.5,  "2.0,  - 0.65.  Let  n.  denote  the  double 

exponential  population  with  location  parameter  0.  and  scale  parameter  I. 

The  five  pseudo- random  samples  from  ir.,  i = 1,...,5,  are  given 
as  follows.  In  each  case  9 observations  were  taken. 
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^2 

"3 

"5 

-3.4839 

- .9839 

- .0839 

-5.4839 

-4.1339 

-2.6762 

- .1762 

00 

CN 

-4.6762 

-3.3262 

- .3129 

2.1871 

3.0871 

-2.3129 

- .9629 

- .2264 

2.2736 

3.1736 

-2.2264 

- .8764 

- .1761 

2.3239 

3.2239 

-2.1761 

- .8261 

. 1462 

2.6462 

3.5462 

-1.8538 

- .5038 

.3033 

2.8033 

3.7033 

-1.6967 

- .3467 

1 .6160 

4.1160 

5.0160 

- .3840 

.9660 

5-6924 

8.1924 

9.0924 

3.6924 

5.0424 

Let  Y.  * sample  median  of  samples  of  size  9.  Then  Yj  = -.1761, 

Y2  = 2.3239.  Yj  = 3.2239,  Y^  = -2.1761,  Y^  = -.8261.  The  procedure 

proposed  for  selecting  a subset  to  include  the  population  with  largest 

location  parameter  (or  median)  9 is 

P.,:  Select  population  tt.  iff  Y.  > max  Y.  - d. 

' ' - l^<5  J 

For  P*  = 0.95,  the  tabulated  value  of  the  selection  constant  d corresponding 
to  k = 5 (p  = 4)  and  n = 4 is  1.3210,  hence  Rj  reduces  to 
R,:  Select  population  tt.  iff  Y.  > 3.2239  - 1.3210  = 1.9029 


which  selects  populations  TT2  and  tt^  corresponding  to  the  two  largest  val 
of  sample  medians. 

For  the  problem  of  testing  the  hypothesis  H:  6j  = 6^  =...=  9^,  let 

a ” 0.05,  so  that  * ~ ^ ~ ‘ ‘ = 0.99;  again,  we  find  from  the  table 

at  the  end  of  the  paper  that  the  critical  value  r is  1.7942.  Since  the 
sample  median  range 

R = max  Y.  - min  Y.  = 3-2239  - (-2.1761)  = 5-4 
|£j<5  J i£j<5  J 

which  is  greater  than  the  critical  value,  the  hypothesis  H:  6^  =..,=  0^ 
is  rejected  at  5%  level  of  significance. 
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sample  of  size  (2n+l ) from  the  standard  double  exponential  distribution. 
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